Spark Protocol
This document aims to describe the entire workflow of SPARK retrieval checks, based on the parts discussed in , and other related documents. See also
Table of Contents
Context
SPARK will operate within the MERidian framework. We want to be decentralised, with no single party in charge of any component.
Overview
At a high level, the protocol is split into the following steps:
- Tasking: Cluster the online SPARK checkers into several committees.
- CID Sampling: Choose a random (CID, SP) pair for each committee.
- Retrieval with Attestation: Retrieve CID content from SP and obtain the attestation token.
- Proof of Data Possession: Create proof that the entire CID content was retrieved.
- Measurement: Report job outcome to MERidian
- Evaluation & Verification: Evaluate the impact of checkers and detect fraud.
The last phase - Reward - is handled by MERidian smart contracts.
1) Tasking
In this step, we want to match running SPARK checker instances with retrieval jobs in such a way that makes fraud more difficult. For each retrieval job (a pair (CID, SP)), we want to form a small committee of peers to perform the same retrieval redundantly to arrive at an honest majority result.
In the rest of the section, a task represents an abstract but fixed retrieval check that checker nodes will perform. The tasking algorithm does not need to concern with what specific (CID, SP) is derived from each task, as long as we have a deterministic algorithm for that. The conversion from ātaskā to (CID, SP) job is explained in the section .
Requirements
- Rate Limiting
We want to rate-limit how many jobs are performed by each checker.- This is especially important for the LabWeek23 release, when we wonāt have the retrieval fraud detection implemented yet. Without fraud detection, a fraudulent node can short-circuit job execution by skipping the retrieval and reporting fake results instead. This allows nodes to cheaply report many completed jobs, which is later translated to disproportionally large impact & reward.
- Even with fraud detection in place, if we donāt rate-limit the checks, then nodes on fast, unmetered internet connections can tweak the SPARK checker code to perform more checks than we designed for. This can put too much pressure on SPs. Plus, the operator will get a larger portion of rewards, which is unfair to honest nodes.
š¤Miroslav
After describing this requirement in detail, I am no longer sure if itās really required in the longer term. What can go wrong if we allow nodes to make many honest retrieval checks?- Node operator creates more impact and thus receives more rewards. Thatās the core of the MERidian scheme, right?
- Node operators download much more data from SPs. If we sample CIDs uniformly, the load should be spread across SPs based on how much data they store. Any single SPs can be overloaded only if they provide a large fraction of FIL capacity. If thatās the case, they must be prepared to handle more retrievals than other SPs providing less capacity.
- Make Sybil attacks difficult/expensive
We want to limit how many checker instances a single party operates by disincentivizing operating too many instances.- The primary goal is to make Sybil attacks more difficult/expensive.
- As a side effect, this also makes it more challenging to avoid rate limiting by running many SPARK instances on the same machine.
- Make SP collusion difficult
We want to make it difficult to run colluding SPARK nodes that will limit the retrieval checks only to a given SP or perhaps a small fixed set of CIDs.
- Checker committees
Our āProof of Retrievalā solution requires a committee where an honest majority can be formed.
- Avoid DDoSing SPs
We must spread the load across multiple SPs. If the entire SPARK network checks the same (CID, SP) pair, we will effectively DDoS the poor SP.
See also
Tasking Algorithm
We want to use IPv4 as a scarce resource to make it more difficult & expensive for individual operators to run hundreds/thousands of SPARK nodes to gain control of the network.
Each measurement epoch is deterministically tied to a DRAND epoch so that we can use the DRAND signature as the source of randomness.
We want to have a list of currently active nodes to lower the probability of selecting offline nodes for a committee.
Decentralised Design
We assume a network of tasker nodes will be operated by the community. Initially, there will be only one tasker node run by PL. Later, we will design & implement a reputation and incentive system that will allow more parties to join the network. E.g. a new tasker node stakes some FIL to join, the network penalises misbehaving nodes by slashing their stake, and finally, SPARK diverts a part of the reward pool to pay tasker nodes for their service.
Letās assume we want to define a new set of tasks once per DRAND epoch (every 30 seconds).
TL;DR:
- The tasking algorithm has two inputs:
- The DRAND signature for the current epoch as the source of randomness
- A list of checker nodes active at the beginning of the current epoch, including their public IPv4 addresses.
- The outputs are deterministically computed from those inputs.
- A list of committees.
- For each committee, a list of checkers belonging to the committee.
- Commitments and inclusion proofs as needed by other SPARK components.
- This reduces the problem of tasking decentralisation to the following question:
- How can we maintain a list of currently active checkers & their IP addresses in a decentralised way that the community & we can trust?
Membership service
This part described the answer to the question asked above: How can we maintain a list of currently active checkers & their IP addresses in a decentralised way that the community & we can trust?
Letās recall that we assume the existence of a network of tasker nodes operated by the community.
- Each checker node will periodically send a heartbeat message to the tasker network.
- There are different ways how we can implement this. For example, there can be a DHT table of all tasker nodes, and the checker node selects the closest tasker node. Another example is using gossip-style broadcasting.
- For the initial release, the network will have only one node operated by PL, and the checker nodes will contain hardcoded configuration pointing to this service.
- The tasker nodes will build a shared representation of what checker nodes are currently up and running.
- Again, there are different ways how to implement this. What we want is a decentralised key/value store with expiration.
- For each checker node, we need to record the following fields:
- identity - nodeās public key
- address - nodeās public IPv4 address (or the address of the NAT router)
- timestamp - time when the last heartbeat was received
- At the beginning of each epoch, the tasker network finds a consensus about the list of all active checker nodes.
We should design this scheme to allow individual checker nodes to verify that they were included in the list while protecting the privacy of other checker nodes in the network.
- The tasker network creates a membership commitment satisfying the following requirements:
- It allows the MERidian/SPARK evaluation step to verify that the committee selection was executed with the right input data.
For example, this can be implemented by storing the list of active members in CAS (content-addressable-storage) and recording the CID on the chain. However, we must restrict access to raw data (checkers IPv4 addresses).
- It allows individual checker nodes to verify that they were included in the list of members used for task selection.
- It must preserve the privacy of checker nodes - their IPv4 addresses must not be open to the public.
- It allows the MERidian/SPARK evaluation step to verify that the committee selection was executed with the right input data.
Selection of Committees
Inputs:
- List of all active checker nodes (provided by the previous step)
- DRAND signature as the source of randomness
Steps:
- At the beginning of each epoch, after the tasker network reaches a consensus about the list of online checker nodes, we can proceed to define tasks for this epoch.
There are different ways how to implement this. One option is to choose a different leader for each epoch. The leader will define the tasks, and other nodes will check that the leader is not cheating. The leader gets the reward (mints a block reward).
For the initial release, the network will have only one node operated by PL, so we donāt need to worry about the details yet.
- Partition all active checkers into committees of fixed size so that:
- each IPv4 /24 subnet is in exactly one committee
- there is exactly one checker from each IPv4 /24 subnet selected to participate in the committee.
- the selection process creates a uniform distribution
- the selection uses the DRAND signature as the source of randomness.
- Create a tasking commitment to the list of committees and their members. Record this commitment in a place where the SPARK/MERidian evaluator can find it.
- For each checker elected into a committee, create an inclusion proof against the tasking commitment. This allows checkers and the SPARK/MERidian evaluator to validate that the checker was assigned the task.
- Partition all active checkers into committees of fixed size so that:
- At regular intervals, checkers ask for a task to perform.
- The checker can communicate with the same tasker node it sends heartbeats to.
- For the initial release, the network will have only one node operated by PL, and the checker nodes will contain hardcoded configuration pointing to this service.
- The tasker node returns one of the following two responses:
- No new task is scheduled for the checker in this epoch.
- Case 1: The checker already performed the task scheduled for this epoch.
- Case 2: No task was scheduled for this particular checker in the current epoch.
- Details of the task scheduled for the checker in the current epoch and the time until the next epoch starts (X seconds).
Task fields:- DRAND signature for this epoch
- Tasking commitment, committee id this checker belongs to and inclusion proof of that.
- The committee id is computed deterministically from the list of committee members (their public keys).
- The inclusion proof allows other components in the system to verify that the checker was entitled to perform this task in this epoch.
In both cases, the response will also contain the following fields:
- Time until the next epoch starts, e.g. X seconds.
- Commitment to the list of all active checker nodes. It allows the checker to verify that it was considered for the task selection algorithm. Alternatively, maybe the tasking service should provide inclusion proof instead?
In both cases, the checker will repeat step 2 after X seconds as instructed by the service.
- No new task is scheduled for the checker in this epoch.
Outputs
The ātasker nodesā know:
- List of active members for the current epoch
The ānetworkā knows - this knowledge is shared by ātasker nodesā and the evaluation/fraud detection service, e.g. via on-chain smart contract state:
- Membership commitment to the list of all active members
- DRAND signature for the current epoch
- Tasking commitment to the definition of committees/tasks.
Each tasked checker knows:
- DRAND signature for the current epoch
- Tasking commitment to the definition of committees/tasks
- The committee id this checker belongs to
- Task inclusion proof allowing 3rd parties to verify that this node belongs to its committee as specified by the id and that this committee is included in the tasking commitment.
Concerns
Proof of IPv4 address
- Not a problem initially when there is only one tasker node operated by PL.
- With a network of untrusted tasker nodes:
- When a checker sends the heartbeat message, the tasker node sees the checkerās public IPv4.
- We can design a scheme allowing anybody to challenge the checker address reported by a particular tasker node. The process will start with forming an honest majority of tasker nodes, instructing the checker node to send a heartbeat to all tasker nodes, and then comparing the results. Of course, the checker node running on Station Desktop can be offline, so we need to account for that. If the honest majority of tasker nodes agrees on a different IPv4 for the given checker node, then we can slash the tasker which reported incorrect address. However, we must also take into account that people change their IPv4 addresses over time.
- We can ask the checker node to always send the heartbeat message to multiple tasker nodes. That way, we donāt need to trigger a challenge as long as all taskers report the same address.
We donāt want to make IPv4 addresses public. Hashing an address is not enough because itās trivial to enumerate & hash all possible IPv4 addresses to create a rainbow table.
- In the current design, IPv4 addresses are private to tasker nodes.
- We may want to research options that will make it difficult or expensive for tasker nodes to expose checkersā addresses.
- I think this ultimately boils down to the fact that you have to trust the remote parties you are connecting to.
Membership information thatās up-to-date but does not require the frequent invocation of a membership smart contract.
- We donāt need the membership information to be on the chain. In the current design, membership is tracked by a network of tasker nodes and does not require any checkers to interact with any chains.
What would be a stop-gap solution we can implement faster to buy us more time for implementing the proper decentralised one?
- Using the proposed design, reduce the network of tasker nodes to a single node operated by PL.
How can checker nodes obtain the tasks defined by the decentralised orchestrator?
- Checker nodes are registered with tasker nodes. Checker nodes periodically ask their tasker node for a new task.
How to handle the initial state when a checker node does not have any FIL to pay for the gas fee and thus cannot invoke any smart contracts?
- Not applicable; checker nodes do not need to interact with any chains.
2) CID Sampling (per each task)
Prior discussion:
This algorithm is executed for every task defined above.
- Checker nodes execute the algorithm to find out what retrieval they should check.
- The evaluation service may run this algorithm to verify that the checkers selected the right retrieval. In practice, this is required only if there is a disagreement between the same committee members.
Inputs:
- The committee ID to execute the task
- The DRAND signature for the current epoch
How to choose a random (CID, SP) pair - implementation details to be added soon-ish.
- Initialise the random number generator using the inputs as the seed. E.g.
rand = hash(concat(drand_sig, committee_id))
- Obtain the current tipset of Filecoinās blockchain. (It may be easier to obtain the previous tipset in a way we can trust the result.)
- From this tipset, get the set of active deals. š¤Somebody in the FIL ecosystem is maintaining a snapshot of active deals on Amazon S3. The file can be downloaded from https://marketdeals.s3.amazonaws.com/StateMarketDeals.json.zst; its size is over 3GB.
Depending on that single file is a single point of failure, though.
We cannot expect SPARK checker nodes (Station Desktop instances) to download the entireĀStateMarketDeals.json.zstĀ file to be able to perform CID sampling.
- Pick a random deal (using the random number generator created in step 1). Obtain PieceCID and SP identity for that deal.
- Obtain the multiaddr of Boost HTTP API and the identity (public key) of the Boost worker registered for the selected SP. If we cannot get that information, then flag the SP as non-conforming and repeat step 3 again.
- Get a random CID from the given PieceID (using the random number generator created in step 1).
- Ideally, we want each checker to do this on their own. E.g. choose a random range within PieceCID, ask SP for those bytes, then parse the data to find CIDs, then pick one of them at random.
- An easier, although more centralised, option is to query the IPNI indexer HTTP API and request a random CID from
PieceIDusing our current randomness seed.
Outputs (per task)
CIDto retrieve
multiaddrof Boost worker HTTP API to dial
pubkeythe public key of Boost worker identity
Shall we add tipset, so that the evaluation service knows all inputs for verifying the result?
3) Retrieval with Attestation
Prior discussion:
A retrieval-check job is defined as follows:
CIDto retrieve
multiaddrof Boost worker HTTP API to dial
pubkeythe public key of Boost worker identity
randseed to use for randomness
Additionally, each checker node has a FIL wallet or a libp2p identity (a key pair).
- SPARK checker retrieves
CIDfrommultiaddrand asks for retrieval attestation (CAR metadata block).
- While reading the response body, it incrementally verifies the CAR stream.
- While reading the response body, it computes a Blake3 hash of the response bytes up to the CAR metadata block.
- After receiving the entire response, it parses
car_length, Blake3 hash and the providerās signature from the CAR metadata block (see IPIP-431).- Verifies that
car_lengthmatches the length of the response up to the metadata block
- Verifies that the Blake3 hash matches the hash computed locally
- Verifies the signature
- Verifies that
Outputs (per job)
car_length
- Blake3 hash
- Any fields from the retrieval attestation required to verify the SP signature
- SP signature
4) Proof of Data Possession
Using the data from , the SPARK Checker performs the following steps:
- It uses its private wallet key to sign the following message:
<rand><cid><multiaddr>and uses the signature to choose a random block in the Blake3 tree:
block_index = signature % (ROUND_UP(car_length/1024))
- It calculates Blake3 inclusion proof of 1024 bytes in the block at the
block_indexposition.
Open questions
- How to prevent / if we need to prevent one station from asking another station to generate an inclusion proof following a retrieval so that it does not need to do the full retrieval itself
Outputs (per job)
signatureover<rand><cid><multiaddr>
- Blake3 inclusion proof for
block_index
5) Measurement
Using the data from the previous steps, the SPARK Checker submits the job outcome to the SPARK/MERidian measurement service.
Implementation details of the submission process are covered by MERidian, see e.g.
Besides the retrieval telemetry like response status code and TTFB, it also uploads the following fields:
- DRAND signature for the current epoch (or perhaps just the DRAND epoch number?)
- Data from Tasking
- Tasking commitment to the definition of committees/tasks
- The committee id this checker belongs to
- Task inclusion proof allowing 3rd parties to verify that this node belongs to its committee as specified by the id and that this committee is included in the tasking commitment.
- Data from CID Sampling
CIDto retrieve
multiaddrof Boost worker HTTP API to dial
pubkeythe public key of Boost worker identity
- Data from Retrieval with Attestation
car_length
- Blake3 hash
- Any fields from the retrieval attestation required to verify the SP signature
- SP signature
- Data from
- Checkerās signature over
<rand><cid><multiaddr>
- Blake3 inclusion proof for the selected block of data.
- Checkerās signature over
- Checkerās identity (the public key used for membership registration and PoDP)
Performance aspects
- The measurement size will be roughly
400 + CIP + B3IPbytes, whereCIPis the committee inclusion proof size andB3IPis the Blake3 inclusion proof for a block of data.
- IIUC,
B3IP = 1024 + O(log(S))whereSis the byte size of the retrieved content. We should limit the maximum number of bytes retrieved from SPs to put an upper bound on the size of this part.
CIP=O(log(N)), whereNis the number of checker nodes online. We should research more efficient inclusion proofs, e.g. vector commitments. If we cannot find a more efficient representation, then we need to adjust the design so that checker nodes do not need to submit inclusion proofs for task-selection verification.
Additional considerations
- If we make the data publicly available on the chain, we must encrypt the job outcome using timelock encryption that will reveal the data only after the current measurement epoch is over.
- If the Blake3 inclusion proof is publicly available, then other nodes can use this inclusion proof to calculate inclusion proof for their data block without having access to the entire CAR stream. This is a problem only if we check the retrieval of the same CID repeatedly in different epochs, which is highly unlikely.
6) Evaluation & Verification
These evaluation steps are executed whenever a MERidian measurement service submits a batch of job reports.
Here we assume that the evaluation is performed off-chain, and the MERidian takes care of executing this computation in a decentralised way. Again, refer to for more details.
Certain parts can be verified only after we receive measurements from all members of a committee. MERidian triggers the first evaluation step for every batch of measurements submitted, and a batch of measurements may not cover the entire committee. Therefore, we need the evaluation process to be stateful: wait until all committee members report their measurements and only then perform the verification checks.
The steps below describe the happy path, where we can form an honest majority in every step and donāt penalise misbehaving checkers.
Inputs
- DRAND signature or epoch number linked to the current SPARK epoch
- Tasking commitment created by the network of tasking nodes
- Measurement records as provided by MERidian Measurement service/smart-contract.
Per-measurement steps
These can be executed immediately.
- Verify the tasking commitment reported by the checker node matches the commitment for the current epoch stored on-chain or within the tasker network.
Discard measurements that donāt pass this check.
- Verify the task inclusion proof.
Discard measurements that donāt pass this check.
Per committee steps
These steps must be executed after we collect all measurements for a committee.
At this point, thanks to per-measurement steps, we have the following guarantees:
- The measurement uses the correct DRAND signature for randomness
- The measurement ābelongsā to the list of committees defined by the tasking network for the current epoch.
The process:
- Verify that all committee members arrived at the same job definition
(CID, multiaddr, pubkey)
In case of discrepancies, assume an honest majority to decide whatās the right answer. Alternatively, or if there is no majority, follow to compute the correct values ourselves. This is a potential DoS attack vector; we must limit how often we run this.
Discard measurements that donāt pass this check.
- Compare
car_lengthand Blake3 hash and provider signature reported by different checkers (committee members). Build an honest majority to have confidence about what is the correctcar_lengthand Blake3 hash.
Discard measurements that donāt pass this check.š¤Open questions:- How to handle the case when we cannot form any majority?
- How to distinguish faulty SPs from misbehaving checkers?
- What to do about reports where the signature does not match?
- Verify the validity of the providerās signature over the retrieval attestation. Note: by now, we have filtered out all measurements that disagree with the majority, all remaining measurements have the same SP signature, and therefore we can run the validation only once per committee.
Discard measurements that donāt pass this check.
- For each measurement in this committee:
- Validate the checkerās signature
- Build
<rand><cid><multiaddr>- we have this data in the measurement
- Validate checkersā signature of the payload - we have the public key in the measurement
Discard measurements that donāt pass this check.
- Build
- Calculate
block_index = signature % (ROUND_UP(car_length/1024))
- Validate the Blake3 inclusion proof for this block index
Discard measurements that donāt pass this check.
- Validate the checkerās signature
Now we have a set of measurements where we can trust the nodes performed the task assigned by SPARK and retrieved the full content.
Output
- A filtered set of measurements we can feed into evaluate & reward functions.
Performance aspects
TBD: what is the computational complexity (esp. the time complexity) of the verification algorithm in regards to the number of checker nodes and the size of retrieved data.
My initial estimate: for every SPARK epoch, the time complexity is N*(log(N) + log(S))
Nis the number of checker nodes online, equal to the number of measurements reported.
log(N)comes from the task inclusion proof - we can optimise this by switching to e.g. vector commitments.
Sis the (mean average?) size of the data retrieved.
log(S)comes from the Blake3 inclusion proof for data possession.
OUTDATED CONTENT (PREVIOUS ITERATION)
Node registration
For each pair (CID, SP) we want to check, we need to pick a subset of SPARK checkers to form a committee that will give us an honest majority. To do that, we need a list of currently running checkers (list of network members).
- We need the list of currently active nodes to lower the probability of selecting offline nodes for a committee.
Options:
- Long-polling HTTP requests: a checker sends a request to Spark Orchestrator. The Orchestrator keeps the connection open until there is a job to be performed by the checker. The list of all open HTTP connections gives us the members.
- Downsides: centralised, very poor scalability
- Heartbeat: a checker sends a heartbeat (ping) at a configured interval, e.g. every minute. The Orchestrator keeps a list of
(member, timestamp)records. We filter out stale records (timestampis older than one minute) to get the list of active members.- Downsides: centralised, there is a delay in detecting offline nodes
- On-chain registration: a checker registers itself with a smart contract. I think this will have to be implemented as a heartbeat, too (?).
- Downsides: expensive to run, checkers must have FIL to pay gas fees
- Centralised registration with verification:
- Same as Heartbeat (option 2). For each epoch, the membership service publishes an inclusion proof or a commitment on the chain, allowing individual checker nodes to verify that they were considered for receiving a job.
Proposal
Implement the second solution (heartbeat) for the Q3/Q4 launch. Research a decentralised/web3 solution afterwards.
Job scheduling
Prior discussion:
At every measurement epoch, the Orchestrator (Tasker) schedules retrieval checks to be performed by the network of checker nodes.
We want to use IPv4 as a scarce resource to make it more difficult & expensive for individual operators to run hundreds/thousands of SPARK nodes to gain control of the network.
For each (CID, SP) job, we want to form a committee of nodes where each node performs the same job. This should give us a high degree of confidence that if the majority of nodes report the same results, itās the honest majority.
Each measurement epoch is deterministically tied to a DRAND epoch so that we can use the DRAND beacon as the source of randomness.
Proposed algorithm
This algorithm is executed at the beginning of every SPARK epoch to schedule jobs for this epoch.
Letās assume committee size CS=10. (We will increase this number later based on modelling)
- Initialise the set of partitions P using data from the membership service, grouping checker nodes using the first three bytes of their IPv4 address (their x.y.z.0/24 subnet). Each partition is a list of nodes in the same subnet.
Note: We never add more members to P. Instead, we create a new P using fresh membership data when the next epoch starts.
- Repeat as long as P has at least CS items:
- Choose a random
(CID, SP)pair using the CID sampling algorithm described below
- Choose CS random partitions from P
- For each chosen partition (an IPv4 subnet), pick a random member
checkerin this partition (IPv4 subnet).
- Create a new job assignment
(CID, SP, checker)
- Remove the chosen partitions from P
- Choose a random
- In most cases, MG will not be divisible by CS, so we end up with some leftover groups that were not given any task to do this epoch. Alternatively, we can add these subnets to the last committee or spread them evenly across multiple committees.
CID sampling
How to choose a random (CID, SP) pair - implementation details to be added soon-ish.
- Obtain the current tipset of Filecoinās blockchain.
- From this tipset, get the set of active deals.
- Pick a random deal. Obtain PieceCID and SP identity for that deal.
- Obtain the multiaddr of Boost HTTP API and the identity (public key) of the Boost worker registered for the selected SP. If we cannot get that information, then flag the SP as non-conforming and repeat step 3 again.
- Query the IPNI indexer HTTP API and request a random CID from
PieceIDusing our currentDRANDbeacon.
Tasking checkers
Each checker node will periodically ask the Orchestrator for a new job via HTTP API. The orchestrator gives one of the following responses:
- No new job is scheduled for the client, and it should retry in X seconds.
- Case 1: The node already performed the job scheduled for this epoch.
- Case 2: No job was scheduled for this particular node in the current epoch.
- X is computed as the time until the next epoch starts
- Details of the job scheduled for the node in the current epoch
- If the node does not receive any job, it will repeat the request in X seconds as instructed by the scheduler.
Smart-contract-driven version
How can we remove the reliance on a centralised Orchestrator/Tasker and let a smart contract do the scheduling work? Can we defer this work after the initial release?
Retrieval check
Prior discussion:
A retrieval-check job is defined as follows:
CIDto retrieve
multiaddrof Boost worker HTTP API to dial
pubkeythe public key of Boost worker identity
drandseed to use for randomness
Additionally, each checker node has a FIL wallet (a key pair).
- SPARK checker retrieves
CIDfrommultiaddrand asks for retrieval attestation (CAR metadata block).
- While reading the response body, it incrementally verifies the CAR stream.
- While reading the response body, it computes a Blake3 hash of the response bytes up to the CAR metadata block.
- After receiving the entire response, it parses
car_length, Blake3 hash and the providerās signature from the CAR metadata block (see IPIP-431).- Verifies that
car_lengthmatches the length of the response up to the metadata block
- Verifies that the Blake3 hash matches the hash computed locally
- Verifies the signature
- Verifies that
- Next, it uses its private wallet key to sign the following message:
<drand><cid><multiaddr>and uses the signature to choose a random block in the Blake3 tree:
block_index = signature % (ROUND_UP(car_length/1024))
- It calculates Blake3 inclusion proof of 1024 bytes in the block at the
block_indexposition.
- Finally, it reports the job outcome to SPARK Orchestrator. Besides the retrieval telemetry like response status code and TTFB, it also uploads the following fields:
car_length, Blake3 hash and providerās signature
- checkerās signature over
<drand><cid><multiaddr>
- Blake3 inclusion proof for the selected block of data.
Smart-contract version
- If we make the data publicly available on the chain, we must encrypt the job outcome using timelock encryption that will reveal the data only after the current measurement epoch is over.
- If the Blake3 inclusion proof is publicly available, then other nodes can use this inclusion proof to calculate inclusion proof for their data block without having access to the entire CAR stream. This is a problem only if we check the retrieval of the same CID repeatedly in different epochs.
Job verification
- After each measurement epoch is over, collect all job outcomes reported by SPARK checkers. For each report, verify that the provider signature matches the providerās public key.
- What to do about reports where the signature does not match?
- How to handle reports with missing signatures?
- How to distinguish faulty SPs from misbehaving checkers?
- For each
(CID, multiaddr)pair scheduled for checking:- Compare
car_length, Blake3 hash and provider signature reported by different checkers (committee members).- Can we build an honest majority to have confidence about what is the correct
car_lengthand Blake3 hash?
- What to do if we cannot?
- Can we build an honest majority to have confidence about what is the correct
- For each job report:
- Validate the checkerās signature
- Build
<drand><cid><multiaddr>- we have this data in the job definition
- Validate checkersā signature of the payload - we have the signature in the job outcome; we will need to get the checkerās public key (wallet address) from our other records, e.g. the membership service.
- Build
- Validate the Blake3 inclusion proof
- Filter out invalid reports
- Validate the checkerās signature
- Now we have a set of retrieval metrics we can trust š
- Compare
Public verifiability
How can we allow 3rd parties to repeat the process above using public data to verify that our Orchestrator service is being honest in the way it performs job verification?